| Approach | Distribution | Moments |
|---|---|---|
| Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
| Taylor Series (Extended Kalman Filter) | ||
| Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Jasper Slingsby
Uncertainty determines the utility of a forecast:
If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decisions.
This leaves forecasters with four overarching questions:
The utility of a model/forecast depends on:
combined with
Together these determine the “ecological forecast horizon” (Petchey et al. (2015)).
The ecological forecast horizon (from Petchey et al. (2015)).
Some forecasts may lose proficiency very quickly, crossing (or starting below) the forecast proficiency threshold. If the forecast loses proficiency more slowly, or the proficiency threshold requirements are lower, the forecast horizon is further into the future.
Dietze classifies prediction uncertainty in his book (Dietze 2017a) and subsequent paper (Dietze 2017b) in the form of an equation (note that I’ve spread it over multiple lines):
\[ \underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \; \underbrace{stability*uncertainty}_\text{initial conditions} \; + \\ \] \[ \underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\ \] \[ \underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\ \] \[ \underbrace{Var[\epsilon]}_\text{process error} \; \; \]
If we break the terms down into (something near) English, we get:
The dependent variable:
\[Var[Y_{t+1}] \approx\]
“The uncertainty in the prediction for the variable of interest (\(Y\)) in the next time step (\(t+1\)) is approximately equal to…”
And now the independent variables (or terms in the model):
\[\underbrace{stability*uncertainty}_\text{initial conditions} \; +\]
“The stability multiplied by the uncertainty in the initial conditions, plus”
\[\underbrace{sensitivity * uncertainty}_\text{drivers} \; + \]
“The sensitivity to, multiplied by the uncertainty in, external drivers, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{Var[\epsilon]}_\text{process error}\] “The process error.”
There are many methods, but it’s worth recognizing that these are actually two steps:
This could be a lecture series of its own. In short, there are 5 main methods for propagating uncertainty through the model, and most have related methods for propagation into the forecast (see Table on next slide).
The methods differ in whether they:
They also have trade-offs between efficiency vs flexibility.
| Approach | Distribution | Moments |
|---|---|---|
| Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
| Taylor Series (Extended Kalman Filter) | ||
| Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Note: It is possible to propagate uncertainty through the model and into your forecast in one step with Bayesian methods, by treating the forecast states as “missing data” values and estimating posterior distributions for them. This would essentially fit with Monte Carlo methods in the table. This approach may not suit all forecasting circumstances though.
Firstly, by working out where it’s coming from
Secondly, by targeting and reducing sources of uncertainty
Identifying the sources of uncertainty requires looking at the two ways in which they can be important for the uncertainty in predictions (largely covered in the equation earlier):
Targeting and reducing sources of uncertainty is not always straightforward.
Parameters that are highly uncertain and to which our state variable (Y) are highly sensitive cause the most uncertainty for predictions.
But, given limited resources, may not be the best target for a number of reasons, e.g.
In fact, you can build a model to predict where your effort is best invested by exploring the relationship between sample size and contribution to overall model uncertainty! You can even include economic principles to estimate monetary or person-hour implications. This is called observational design.
Bayes’ Rule:
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is proportional to the likelihood times the prior.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is the conditional probability of the parameters given the data \(p(\theta|D)\) and provides a probability distribution for the values any parameter can take,
This allows us to represent uncertainty in the model and forecasts as probabilities, which is powerful for indicating the probability of our forecast being correct.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The likelihood \(p(D|\theta)\) represents the probability of the data \(D\) given the model with parameter values \(\theta\), and is used in analyses to find the likelihood profiles of the parameters.
This term looks for the best estimate of the parameters using Maximum Likelihood Estimation, where the likelihood of the parameters are maximized for a given model by choosing the parameters that maximize the probability of the data.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The prior is the marginal probability of the parameters, \(p(\theta)\).
It represents the credibility of the parameter values, \(\theta\), without the data, and is specified using our prior belief of what the parameters should be, before interrogating the data. This provides a formal probabilistic framework for the scientific method, in that new evidence must be considered in the context of previous knowledge, providing the opportunity to update our beliefs.
Data can enter (or be fused with) a model in a variety of ways. Here we’ll discuss these and then give an example of the Fynbos postfire recovery model used in the practical.
The opportunities for data fusion are linked to model structure, so we’ll revisit how some aspects of model structure change as we move from Least Squares to Maximum Likelihood Estimation to “single-level” Bayes to Hierarchical Bayes and the data fusion opportunities provided by each.
Conceptually (and perhaps over-simplistically), one can think of the changes in model structure as being the addition of model layers, each of which provide more opportunities for data fusion.
Least Squares makes no distinction between the process model and the data model.
the process model models the drivers determining the pattern observed (i.e. is the model equation you will be familiar with, such as a linear model)
a data model models the observation error or data observation process, i.e. the factors that may cause mismatch between the process model and the data
in least squares the data model can only ever be a normal (also called Gaussian) distribution, because we require homogeneity of variance in order to minimize the sums of squares
the only opportunity to add data to a least squares model is via the process model